The present study will examine young adults from the The National Longitudinal Study of Adolescent to Adult Health (AddHealth). The goals of the analysis will include 1) establishing the relationship between being in an abusive relationship and individual’s happiness in that abusive relationship; 2) determining whether or not the different forms of abuse would make different impacts to the well-being state of individuals in that abusive relationship; 3) determine how the association/relationship between being in an abusive relationship and individual’s well-being in that abusive relationship may shape differently taking in consideration of ethnicity, age, and sexual orientation.
Variables from AddHealth that will be used include: H4RD2Y (The total amount of time that the individual were involved in a romantic or sexual relationship with their partner), H4RD5 (The amount of nights on an average week when you and your partner spent the night together), H4RD7B (Satisfaction level with the way individuals handle their problems and disagreements), H4RD9 (The level of happiness of individuals in the relationship), H4RD18 (The frequency of physical abuse threats and attempts), H4RD20 (The frequency of physical injures), H4RD21 (The frequency of non-consensual sexual activities“), H4TR1(The number of people you have married). 
First, the data is placed on the search path using the PDS package. The variables of interest are selected and stored in the data frame NDF using the select function from the dplyr package. Then, those variables of interest are renamed and given more reasonable names instead of the original unique identifiers from the codebook.
library(PDS)
NDF <- addhealth_public4 %>%
rename(TimeSpent = h4rd2y, NightsSpent = h4rd5, SatisfactionWConflict = h4rd7b, HappinessLevel = h4rd9, PhysicalAbuseAttempts = h4rd18, PhysicalInjure = h4rd20, SexualAssaults = h4rd21, PeopleMarried = h4tr1, CigsSmoke = h4to6, SmokeFreq = h4to5 ) %>%
select(TimeSpent, NightsSpent, SatisfactionWConflict, HappinessLevel, PhysicalAbuseAttempts, PhysicalInjure, SexualAssaults, PeopleMarried, CigsSmoke, SmokeFreq)
summary(NDF)
TimeSpent NightsSpent SatisfactionWConflict HappinessLevel
Min. : 0.000 Min. : 0.00 Min. : 1.000 Min. :1.000
1st Qu.: 1.000 1st Qu.: 5.00 1st Qu.: 1.000 1st Qu.:1.000
Median : 4.000 Median :97.00 Median : 2.000 Median :1.000
Mean : 6.753 Mean :65.52 Mean : 2.643 Mean :2.416
3rd Qu.: 8.000 3rd Qu.:97.00 3rd Qu.: 3.000 3rd Qu.:2.000
Max. :98.000 Max. :98.00 Max. :98.000 Max. :8.000
NA's :1536 NA's :1536 NA's :1536 NA's :1536
PhysicalAbuseAttempts PhysicalInjure SexualAssaults PeopleMarried
Min. : 0.000 Min. : 0.00 Min. : 0.0000 Min. :0.0000
1st Qu.: 0.000 1st Qu.:97.00 1st Qu.: 0.0000 1st Qu.:0.0000
Median : 0.000 Median :97.00 Median : 0.0000 Median :0.0000
Mean : 1.401 Mean :84.38 Mean : 0.7701 Mean :0.5487
3rd Qu.: 0.000 3rd Qu.:97.00 3rd Qu.: 0.0000 3rd Qu.:1.0000
Max. :98.000 Max. :98.00 Max. :98.0000 Max. :8.0000
NA's :1536 NA's :1536 NA's :1536 NA's :1390
CigsSmoke SmokeFreq
Min. : 1.0 Min. : 0.000
1st Qu.: 15.0 1st Qu.: 0.000
Median :997.0 Median : 0.000
Mean :640.7 Mean : 9.258
3rd Qu.:997.0 3rd Qu.:25.000
Max. :998.0 Max. :98.000
NA's :1390 NA's :1390
NDF3 <- NDF %>%
select(HappinessLevel, PhysicalAbuseAttempts, SexualAssaults)
In the next section of code, responses to the questions are labeled and levels of factors are given informative labels. The order of the levels is also rearranged for the variables TimeSpent, SatisfactionWConflict, HappinessLevel, PhysicalAbuseAttempts, PhysicalInjure, and SexualAssaults, the number of responses is pulled. Then, the factors are given readable names instead of being named with numbers.
The first variables examined are the variables related to the level of satisfactions level with the way individuals handle their problems and disagreements.
xtabs(~SatisfactionWConflict, data = NDF)
SatisfactionWConflict
1 2 3 4 5 96 98
1670 1743 711 587 233 11 13
NDF$SatisfactionWConflict[NDF$SatisfactionWConflict==96 | NDF$SatisfactionWConflict==98] <- NA
NDF$SatisfactionWConflict <- factor(NDF$SatisfactionWConflict, labels = c("strongly agree", "agree", "neither agree nor disagree", "disagree","strongly disagree"))[, drop = TRUE]
xtabs(~SatisfactionWConflict, data = NDF)
SatisfactionWConflict
strongly agree agree
1670 1743
neither agree nor disagree disagree
711 587
strongly disagree
233
The second variables examined are the variables related to the level of happiness of individuals with their romantic relationship.
xtabs(~HappinessLevel, data = NDF)
HappinessLevel
1 2 3 6 7 8
2807 979 258 11 906 7
NDF$HappinessLevel[NDF$HappinessLevel>=6] <- NA
NDF3$HappinessLevel[NDF3$HappinessLevel>=6] <- NA
NDF$HappinessLevel <- factor(NDF$HappinessLevel, labels = c("very happy", "fairly happy", "not too happy"))[, drop = TRUE]
xtabs(~HappinessLevel, data = NDF)
HappinessLevel
very happy fairly happy not too happy
2807 979 258
The third variables examined are the variables related to the physical abuse threats and attempts toward the other partner in the romantic relationship.
xtabs(~PhysicalAbuseAttempts, data = NDF)
PhysicalAbuseAttempts
0 1 2 3 4 5 6 7 96 98
3880 294 306 186 148 48 25 37 28 16
NDF$PhysicalAbuseAttempts[NDF$PhysicalAbuseAttempts>=9] <- NA
NDF3$PhysicalAbuseAttempts[NDF3$PhysicalAbuseAttempts>=9] <- NA
NDF$PhysicalAbuseAttempts <- factor(NDF$PhysicalAbuseAttempts, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~PhysicalAbuseAttempts, data = NDF)
PhysicalAbuseAttempts
never
3880
this has not happened in the past year, but it did happen before then
294
once in the last year of the relationship
306
twice in the last year of the relationship
186
3 to 5 times in the last year of the relationship
148
6 to 10 times in the last year of the relationship
48
11 to 20 times in the last year of the relationship
25
more than 20 times in the last year of the relationship
37
The fourth variables examined are the variables related to the individuals’ physical abuse injure frequency in the romantic relationship.
xtabs(~PhysicalInjure, data = NDF)
PhysicalInjure
0 1 2 3 4 5 6 7 96 97 98
395 74 86 32 35 17 2 12 17 4289 9
NDF$PhysicalInjure[NDF$PhysicalInjure>=9] <- NA
NDF$PhysicalInjure <- factor(NDF$PhysicalInjure, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~PhysicalInjure, data = NDF)
PhysicalInjure
never
395
this has not happened in the past year, but it did happen before then
74
once in the last year of the relationship
86
twice in the last year of the relationship
32
3 to 5 times in the last year of the relationship
35
6 to 10 times in the last year of the relationship
17
11 to 20 times in the last year of the relationship
2
more than 20 times in the last year of the relationship
12
The fifth variables examined are the variables related to the non-consensual sexual activities that the individuals were forced to engage in, in the domestic relationship.
xtabs(~SexualAssaults, data = NDF)
SexualAssaults
0 1 2 3 4 5 6 7 96 98
4622 58 88 70 56 21 10 13 19 11
NDF$SexualAssaults[NDF$SexualAssaults>=9] <- NA
NDF3$SexualAssaults[NDF3$SexualAssaults>=9] <- NA
NDF$SexualAssaults <- factor(NDF$SexualAssaults, labels = c("never", "this has not happened in the past year, but it did happen before then ", "once in the last year of the relationship","twice in the last year of the relationship", "3 to 5 times in the last year of the relationship","6 to 10 times in the last year of the relationship", "11 to 20 times in the last year of the relationship", "more than 20 times in the last year of the relationship"))[, drop = TRUE]
xtabs(~SexualAssaults, data = NDF)
SexualAssaults
never
4622
this has not happened in the past year, but it did happen before then
58
once in the last year of the relationship
88
twice in the last year of the relationship
70
3 to 5 times in the last year of the relationship
56
6 to 10 times in the last year of the relationship
21
11 to 20 times in the last year of the relationship
10
more than 20 times in the last year of the relationship
13
The sixth variables examined are the variables related to the amount of time spent of the individual were involved in a romantic or sexual relationship with their partner.
xtabs(~TimeSpent, data = NDF)
TimeSpent
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
946 537 438 406 343 341 305 322 296 224 255 138 127 78 43 31 11 8
18 19 96 98
1 1 57 60
NDF$TimeSpent[NDF$TimeSpent == 96] <- NA
NDF$TimeSpent[NDF$TimeSpent == 98] <- NA
NDF$TimeSpent <- factor(NDF$TimeSpent, labels = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19))[, drop = TRUE]
xtabs(~TimeSpent, data = NDF)
TimeSpent
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
946 537 438 406 343 341 305 322 296 224 255 138 127 78 43 31 11 8
18 19
1 1
The seventh variables examined are the variables related to the amount of time spent of the individual were involved in a romantic or sexual relationship with their partner.
xtabs(~NightsSpent, data = NDF)
NightsSpent
0 1 2 3 4 5 6 7 95 96 97 98
327 311 261 201 128 128 63 241 12 14 3273 9
NDF$NightsSpent[NDF$NightsSpent>=8] <- NA
NDF$NightsSpent <- factor(NDF$NightsSpent, labels = c(0, 1, 2, 3, 4, 5, 6, 7))[, drop = TRUE]
xtabs(~NightsSpent, data = NDF)
NightsSpent
0 1 2 3 4 5 6 7
327 311 261 201 128 128 63 241
The eighth variable examined is the variable related to the number of people the individual had ever married (including the current spouse if the individual is married at that current moment)
xtabs(~PeopleMarried, data = NDF)
PeopleMarried
0 1 2 3 4 6 8
2568 2331 197 9 1 7 1
NDF$PeopleMarried[NDF$PeopleMarried>=5] <- NA
NDF$PeopleMarried <- factor(NDF$PeopleMarried, labels = c(0, 1, 2, 3, 4))[, drop = TRUE]
xtabs(~PeopleMarried, data = NDF)
PeopleMarried
0 1 2 3 4
2568 2331 197 9 1
The barplots are all created with the package ggplot2. The barplots start with the defaults for the geom_bar and add more detail to the plot with each graph.
The first graph showed the variables related to the satisfactions with how individuals resolving conflicts or arguments in the relationship range from strongly disagree to strongly agree. And the graph is skewed right distribution. There is a big population that skipped out the question in comparison to other questions.
library(ggplot2)
ggplot(data = NDF, aes(x = SatisfactionWConflict, fill = SatisfactionWConflict)) +
geom_bar() +
labs(title = "I (am/was) satisfied with the way we handle our problems and disagreements", x = "Response to the way individuals \n handle their problems and
disagreements in a relationship") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5)) +
guides(fill = guide_legend(title = "The with the way we handle our problems and
disagreements"))
The second graph showed the variables related to the level of happiness of individuals with their romantic relationship including very happy, fairly happy, and not too happy. And the graph is skewed right distribution.
ggplot(data = NDF, aes(x = HappinessLevel, fill = HappinessLevel)) +
geom_bar() +
labs(title = " In general, how happy are you in your relationship with {initials}?", x = "Response about how happy \n are the individuals in their relationship?") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5)) +
guides(fill = guide_legend(title = "In general, how happy the individuals are in their relationship?"))
From the third graph to the fifth graph, there are 7 responses that the individuals can choose range from “this has not happened in the past year but it did happen before” to “more than 20 times in the past year in the relationship”. The third graph portrays the variables related to the physical abuse threats and attempts toward the other partner in the romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution.
ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = PhysicalAbuseAttempts)) +
geom_bar() +
labs(title = "How often (has/did) {initials} (threatened/threaten) you with violence, (pushed/push) \n or (shoved/shove) you, or (thrown/throw) something at you that could hurt? ", x = "Response to the frequency of physical abuse threats and attempts") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 13)) +
guides(fill = guide_legend(title = "The frequency of physical abuse threats and attempts", legend.key.size = 12))
The fourth graph portrays the variables related to the individuals’ physical abuse injure frequency in the romantic relationship. There was an overwhelming majority that legitimately skip or ignore the question which it is hard to make conjecture on the reason why they did it. Therefore, I omitted that population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution. The majority responded never but there was still considerable populations that experienced physical injures in their relationships.
ggplot(data = na.omit(NDF), aes(x = PhysicalInjure, fill = PhysicalInjure)) +
geom_bar() +
labs(title = "How often (have/did) you (had/have) an injury, such as \n a sprain, bruise, or cut because of a fight with {initials}? ", x = "Response to the frequeny of physical injures") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 13)) +
guides(fill = guide_legend(title = "The frequency of physical injures"))
The fifth graph portrays the variables related to the non-consensual sexual activities that the individuals were forced to engage in, in the domestic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The graph is very similar to skewed right distribution. There was a more significant majority responded never to the non-consensual sexual activities that they engage in compare to the previous question of the frequency of physical injures.
ggplot(data = na.omit(NDF), aes(x = SexualAssaults, fill = SexualAssaults)) +
geom_bar() +
labs(title = " How often (has/did) {initials} (insisted/insist) on or (made/make) \n you have sexual relations with (him/her) when you didn't want to?", x = "Response to the frequeny of non-consensual sexual activities") +
theme_bw() +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 13)) +
guides(fill = guide_legend(title = "The frequency of non-consensual sexual activities"))
The sixth graph portrays the variables related to the amount of time that the individual spent together with their significant other in a sexual or romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The mean of the graph is around at 4.576 years and the median is 4 years. The graph distribution is skewed right.
NDF$TimeSpent <- gsub(",", "", NDF$TimeSpent) # remove comma
NDF$TimeSpent <- as.numeric(NDF$TimeSpent)
hist(NDF$TimeSpent, xlab = "Time Spent Together(in years)", main= "The amount of time that the individual \n spent with their significant other", col= "light blue", breaks=c(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19))
summary(NDF$TimeSpent)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 1.000 4.000 4.576 7.500 19.000 1653
fivenum(NDF$TimeSpent)
[1] 0.0 1.0 4.0 7.5 19.0
The seventh graph portrays the variables related to the amount of time that the individual spent together with their significant other in a sexual or romantic relationship. I omitted the NA population so the graph can be easily analyzed and representable to identify the behavior. The mean of the graph is around at 2.803 nights and the median is 2.0 years. The graph distribution is skewed right.
NDF$NightsSpent <- gsub(",", "", NDF$NightsSpent)
NDF$NightsSpent <- as.numeric(NDF$NightsSpent)
hist(NDF$NightsSpent, xlab = "Nights Spent Together On Average Week", main= "The days on an average week that the individual \n spent with their significant other", col= "light blue", breaks=c(0,1,2,3,4,5,6,7))
summary(NDF$NightsSpent)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 1.000 2.000 2.803 5.000 7.000 4844
fivenum(NDF$NightsSpent)
[1] 0 1 2 5 7
The eighth graph is a bivariate graph which shows the relationship as fraction of Physical Abuse Attempts Frequencies targeting individuals in the relationship by their Satisfaction Level with Conflict.
ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = SatisfactionWConflict)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of Physical Abuse Attempts Frequencies \notargeting individuals in the relationship \nby their Statisfaction Level with Conflict") +
scale_fill_manual(values = c("red", "green", "orange", "blue", "violet"), name = "Statisfaction Level with Conflict Status") +
guides(fill = guide_legend(reverse = TRUE)) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5, size = 14))
The tenth graph is a bivariate graph which shows the relationship as fraction of Non-Consensual Sexual Activities or Sexual Assault Frequencies targeting individuals in the relationship by their Satisfaction Level with Conflict.
ggplot(data = NDF, aes(x = SexualAssaults, fill = SatisfactionWConflict)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of Sexual Assaults Frequencies \ntargeting individuals in the relationship \nby their Statisfaction Level with Conflict") +
scale_fill_manual(values = c("red", "green", "orange", "blue", "violet"), name = "Statisfaction Level with Conflict Status") +
guides(fill = guide_legend(reverse = TRUE)) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5, size = 14))
The eleventh graph is a bivariate graph which shows the relationship as fraction of Happiness Level of individuals in the relationship by their Satisfaction Level with Conflict.
ggplot(data = NDF, aes(x = HappinessLevel, fill = SatisfactionWConflict)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of Happiness Level of individuals in the relationship \nby their Statisfaction Level with Conflict") +
scale_fill_manual(values = c("red", "orange", "green", "blue", "violet"), name = "Happiness Level Status") +
guides(fill = guide_legend(reverse = TRUE)) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5, size = 14))
The twelfth graph is a multivariate graph shows the relationship as fraction of frequency of physical abuse attempts that the individual experienced from their partner happiness level status by the frequency of being the victim of sexual assaults.
ggplot(data = NDF, aes(x = SexualAssaults, fill = HappinessLevel)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of the frequency of physical abuse attempts that the individual experienced\n from their partner happiness level status \nby the frequency of being the victim of sexual assaults") +
scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status") +
guides(fill = guide_legend(reverse = TRUE)) +
facet_grid(PhysicalAbuseAttempts ~ .) +
theme(axis.text.x = element_text(angle = 85, vjust = 0.5, size = 14))
ggplot(data = NDF, aes(x = PhysicalAbuseAttempts, fill = HappinessLevel)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of Physical Abuse Frequencies \n targeting individuals in the relationship \nby their Happiness Level") +
scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status:") +
guides(fill = guide_legend(reverse = TRUE)) +
theme(axis.text.x = element_text(angle = 70, vjust = 0.5, size = 14))
ggplot(data = NDF, aes(x = SexualAssaults, fill = HappinessLevel)) +
geom_bar(position = "fill") +
theme_bw() +
labs(x = "", y = "Fraction",
title = "Fraction of Sexual Assaults Frequencies \n targeting individuals in the relationship \nby their Happiness Level") +
scale_fill_manual(values = c("red", "yellow", "blue"), name = "Happiness Level Status:") +
guides(fill = guide_legend(reverse = TRUE)) +
theme(axis.text.x = element_text(angle = 70, vjust = 0.5, size = 14))
NDF2 <- NDF %>%
select(TimeSpent, PhysicalAbuseAttempts)
NDF2 = na.omit(NDF2)
NDF11 <- NDF %>%
select(TimeSpent, SexualAssaults)
NDF11 = na.omit(NDF11)
summary(NDF2)
TimeSpent
Min. : 0.00
1st Qu.: 1.00
Median : 4.00
Mean : 4.58
3rd Qu.: 8.00
Max. :19.00
PhysicalAbuseAttempts
never :3805
once in the last year of the relationship : 298
this has not happened in the past year, but it did happen before then : 294
twice in the last year of the relationship : 180
3 to 5 times in the last year of the relationship : 139
6 to 10 times in the last year of the relationship : 48
(Other) : 61
tapply(NDF2$TimeSpent, NDF2$PhysicalAbuseAttempts, mean)
never
4.396321
this has not happened in the past year, but it did happen before then
6.619048
once in the last year of the relationship
4.657718
twice in the last year of the relationship
5.200000
3 to 5 times in the last year of the relationship
4.410072
6 to 10 times in the last year of the relationship
4.145833
11 to 20 times in the last year of the relationship
4.240000
more than 20 times in the last year of the relationship
5.111111
tapply(NDF2$TimeSpent, NDF2$PhysicalAbuseAttempts, sd)
never
4.010775
this has not happened in the past year, but it did happen before then
3.473613
once in the last year of the relationship
3.941956
twice in the last year of the relationship
3.959854
3 to 5 times in the last year of the relationship
3.698439
6 to 10 times in the last year of the relationship
3.326167
11 to 20 times in the last year of the relationship
3.072458
more than 20 times in the last year of the relationship
3.955306
PAA <- rep(NA, length(NDF2$PhysicalAbuseAttempts))
This graph is a boxplot graph shows the relationship of physical abuse attempts versus time spent together in years. There are some outliers but beside that the means and medians of variables are very similar to each other.The distribution is slightly skewed right as shown in the graph below:
library(ggplot2)
ggplot(data = NDF2, aes(x = PhysicalAbuseAttempts, y = TimeSpent, fill = PhysicalAbuseAttempts)) +
geom_boxplot() +
theme_bw() +
guides(fill = FALSE) +
labs(x = "Physical Abuse Frequency", y = "Time Spent Together(in years)", title = "Physical Abuse Frequency By Time Spent Together") +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 14))
library(ggplot2)
ggplot(data = NDF11, aes(x = SexualAssaults, y = TimeSpent, fill = SexualAssaults)) +
geom_boxplot() +
theme_bw() +
guides(fill = FALSE) +
labs(x = "Sexual Assaults Frequency", y = "Time Spent Together(in years)", title = "Sexual Assaults By Time Spent Together") +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 14))
This graph is a violin graph shows the relationship of physical abuse attempts versus time spent together in years. The means and medians of variables are very similar to each other. The distribution is slightly skewed right as shown in the graph below:
ggplot(data = NDF2, aes(x = PhysicalAbuseAttempts, y = TimeSpent, fill = PhysicalAbuseAttempts)) +
geom_violin() +
theme_bw() +
guides(fill = FALSE) +
labs(x = "Satisfaction With Conflict", y = "Time Spent Together(in years)", title = "Satisfaction By Conflict Time Spent Together") +
theme(axis.text.x = element_text(angle = 75, vjust = 0.5, size = 14))
\(H_o\): \(\pi_{TS}\) = \(\pi_{PAA}\)
\(H_a\): \(\pi_{TS}\) \(\neq\) \(\pi_{PAA}\)
\(H_o\): There is no association between time spent and physical abuse attempts frequency in the relationship.
\(H_a\): There is no association between time spent and physical abuse attempts frequency in the relationship
PAA[NDF2$PhysicalAbuseAttempts == "never"] <- "0"
PAA[NDF2$PhysicalAbuseAttempts == "this has not happened in the past year, but it did happen before then "] <- "1"
PAA[NDF2$PhysicalAbuseAttempts == "once in the last year of the relationship"] <- "2"
PAA[NDF2$PhysicalAbuseAttempts == "twice in the last year of the relationship"] <- "3"
PAA[NDF2$PhysicalAbuseAttempts == "3 to 5 times in the last year of the relationship"] <- "4"
PAA[NDF2$PhysicalAbuseAttempts == "6 to 10 times in the last year of the relationship"] <- "5"
PAA[NDF2$PhysicalAbuseAttempts == "11 to 20 times in the last year of the relationship"] <- "6"
PAA[NDF2$PhysicalAbuseAttempts == "more than 20 times in the last year of the relationship"] <- "7"
summary(PAA)
Length Class Mode
4825 character character
DF <- data.frame(NDF2$TimeSpent, PAA)
DF <- DF %>%
rename(TimeSpent = NDF2.TimeSpent,PhysicalAbuseAttempts = PAA)
summary(DF)
TimeSpent PhysicalAbuseAttempts
Min. : 0.00 0 :3805
1st Qu.: 1.00 2 : 298
Median : 4.00 1 : 294
Mean : 4.58 3 : 180
3rd Qu.: 8.00 4 : 139
Max. :19.00 5 : 48
(Other): 61
mod1 <- aov(TimeSpent ~ PhysicalAbuseAttempts, data = DF)
mod1
Call:
aov(formula = TimeSpent ~ PhysicalAbuseAttempts, data = DF)
Terms:
PhysicalAbuseAttempts Residuals
Sum of Squares 1447.84 75331.29
Deg. of Freedom 7 4817
Residual standard error: 3.954571
Estimated effects may be unbalanced
summary(mod1)
Df Sum Sq Mean Sq F value Pr(>F)
PhysicalAbuseAttempts 7 1448 206.83 13.23 <2e-16 ***
Residuals 4817 75331 15.64
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Pval <- summary(mod1)[[1]][["Pr(>F)"]][[1]]
Pval
[1] 5.406079e-17
Since \(5.4060789\times 10^{-17}\) < 0.01, the null hypothesis is rejected. It can be concluded that there is a very strongly significant association between The Time Spent In Years between the individual with their partners and Physical Abuse Attempts Frequency.
by(DF$TimeSpent, DF$PhysicalAbuseAttempts, mean, na.rm = TRUE)
DF$PhysicalAbuseAttempts: 0
[1] 4.396321
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 1
[1] 6.619048
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 2
[1] 4.657718
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 3
[1] 5.2
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 4
[1] 4.410072
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 5
[1] 4.145833
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 6
[1] 4.24
--------------------------------------------------------
DF$PhysicalAbuseAttempts: 7
[1] 5.111111
TukeyHSD(mod1)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = TimeSpent ~ PhysicalAbuseAttempts, data = DF)
$PhysicalAbuseAttempts
diff lwr upr p adj
1-0 2.22272699 1.4968754 2.9485785 0.0000000
2-0 0.26139749 -0.4599178 0.9827128 0.9572290
3-0 0.80367937 -0.1109822 1.7183409 0.1338936
4-0 0.01375131 -1.0217331 1.0492357 1.0000000
5-0 -0.25048730 -1.9921393 1.4911648 0.9998653
6-0 -0.15632063 -2.5624108 2.2497695 0.9999994
7-0 0.71479048 -1.2931619 2.7227429 0.9611603
2-1 -1.96132950 -2.9470162 -0.9756428 0.0000001
3-1 -1.41904762 -2.5538993 -0.2841960 0.0037807
4-1 -2.20897568 -3.4432813 -0.9746700 0.0000017
5-1 -2.47321429 -4.3399318 -0.6064967 0.0015392
6-1 -2.37904762 -4.8771574 0.1190621 0.0752629
7-1 -1.50793651 -3.6252828 0.6094098 0.3768112
3-2 0.54228188 -0.5896738 1.6742375 0.8323596
4-2 -0.24764618 -1.4792897 0.9839973 0.9987651
5-2 -0.51188479 -2.3768431 1.3530736 0.9913003
6-2 -0.41771812 -2.9145136 2.0790773 0.9996300
7-2 0.45339299 -1.6624025 2.5691885 0.9981396
4-3 -0.78992806 -2.1439058 0.5640497 0.6412543
5-3 -1.05416667 -3.0020834 0.8937501 0.7252662
6-3 -0.96000000 -3.5193549 1.5993549 0.9486170
7-3 -0.08888889 -2.2781583 2.1003805 1.0000000
5-4 -0.26423861 -2.2717251 1.7432478 0.9999259
6-4 -0.17007194 -2.7750517 2.4349079 0.9999994
7-4 0.70103917 -1.5413976 2.9434760 0.9812392
6-5 0.09416667 -2.8633735 3.0517068 1.0000000
7-5 0.96527778 -1.6785162 3.6090718 0.9554778
7-6 0.87111111 -2.2506776 3.9928998 0.9903917
There are five statistically significant results including:
the difference between the mean of “never” and “this has not happened in the past year, but it did happen before then” which is \(2.4034706\times 10^{-8}< 0.01\).
the difference between the mean of “once in the last year of the relationship” and “this has not happened in the past year, but it did happen before then” which is \(7.2198658\times 10^{-8}< 0.01\)
the difference between the mean of “twice in the last year of the relationship” and “this has not happened in the past year, but it did happen before then” which is \(0.0037807< 0.01\)
the difference between the mean of “3 to 5 times in the last year of the relationship” and “this has not happened in the past year, but it did happen before then” which is \(1.6979877\times 10^{-6}< 0.01\)
the difference between the mean of “6 to 10 times in the last year of the relationship” and “this has not happened in the past year, but it did happen before then” which is \(0.0015392< 0.01\)
NDF4 <- data.frame(NDF3$PhysicalAbuseAttempts, NDF3$SexualAssaults, NDF3$HappinessLevel)
NDF4 <- NDF4 %>%
rename(PhysicalAbuseAttempts = NDF3.PhysicalAbuseAttempts, SexualAssaults = NDF3.SexualAssaults, HappinessLevel = NDF3.HappinessLevel)
summary(DF)
TimeSpent PhysicalAbuseAttempts
Min. : 0.00 0 :3805
1st Qu.: 1.00 2 : 298
Median : 4.00 1 : 294
Mean : 4.58 3 : 180
3rd Qu.: 8.00 4 : 139
Max. :19.00 5 : 48
(Other): 61
NDF3$PhysicalAbuseAttempts <- factor(ifelse(NDF3$PhysicalAbuseAttempts %in% c("1", "2", "3", "4", "5", "6", "7"), "Yes", "No"))
NDF3$SexualAssaults <- factor(ifelse(NDF3$SexualAssaults %in% c("1", "2", "3", "4", "5", "6", "7"), "Yes", "No"))
NDF3$HappinessLevel <- factor(ifelse(NDF3$HappinessLevel %in% c("2", "3"), "No", "Yes"))
NDF4$HappinessLevel <- factor(ifelse(NDF4$HappinessLevel %in% c("2", "3"), "No", "Yes"))
T1 <- xtabs(~HappinessLevel + PhysicalAbuseAttempts, data = NDF3)
T1
PhysicalAbuseAttempts
HappinessLevel No Yes
No 809 428
Yes 4651 616
T2 <- xtabs(~HappinessLevel + SexualAssaults, data = NDF3)
T2
SexualAssaults
HappinessLevel No Yes
No 1106 131
Yes 5082 185
\(H_o\): There is no relationship between happiness level and physical abuse attempts frequency in the relationship.
\(H_a\): There is a relationship between happiness level and physical abuse attempts frequency in the relationship
If Happiness Level(HL) and Physical Abuse Attempts(PAA) are independent, then P(HL & PAA)=P(HL)×P(PAA). We use this rule for calculating expected counts, one cell at a time in T1:
\(Expected Count=\frac{Column Total×Row Total}{Table Total}\)
chisq.test(T1)$expected
PhysicalAbuseAttempts
HappinessLevel No Yes
No 1038.441 198.559
Yes 4421.559 845.441
tab <- prop.table(T1, 1)
tab
PhysicalAbuseAttempts
HappinessLevel No Yes
No 0.6540016 0.3459984
Yes 0.8830454 0.1169546
When examining the association between happiness level (categorical response) and physical abuse attempts frequency in the relationship (categorical explanatory), looking up at the table revealed among daily, young adults smokers (my sample), those were victims of physical abuses were more likely to have experienced being unhappy (34.5998383%) compared to those experiences of being victimed of physical abuses (11.6954623%).
We then test my observation through the Chi-Square Test.
ChiS1 <- chisq.test(T1, correct = FALSE)
ChiS1
Pearson's Chi-squared test
data: T1
X-squared = 389.99, df = 1, p-value < 2.2e-16
ChiS11 <- chisq.test(T2, correct = FALSE)
ChiS11
Pearson's Chi-squared test
data: T2
X-squared = 108.56, df = 1, p-value < 2.2e-16
From the Chi-Square test, we find that the p-value for these tests are \(8.3042086\times 10^{-87}\) and \(2.0285769\times 10^{-25}\) which are extrememly small and definitely smaller than 0.01 which indicate that it is statistically significant. Therefore, we have a very strong evidence and we can reject the null hypothesis and accept the alternative hypothesis that there is a relationship between happiness level and physical abuse attempts frequency in the relationship(\(\chi ^2 = 389.9934728, df = 1, p = 8.3042086\times 10^{-87} < 0.01\)) and there is a relationship between happiness level and sexual assaults frequency in the relationship(\(\chi ^2 = 108.5577038, df = 1, p = 2.0285769\times 10^{-25} < 0.01\)).
T2 <- xtabs(~HappinessLevel + PhysicalAbuseAttempts, data = NDF4)
T2
PhysicalAbuseAttempts
HappinessLevel 0 1 2 3 4 5 6 7
No 803 108 113 85 68 23 11 20
Yes 3077 186 193 101 80 25 14 17
prop.table(T2, 1)
PhysicalAbuseAttempts
HappinessLevel 0 1 2 3 4
No 0.652315191 0.087733550 0.091795288 0.069049553 0.055239643
Yes 0.833197942 0.050365556 0.052261034 0.027349039 0.021662605
PhysicalAbuseAttempts
HappinessLevel 5 6 7
No 0.018683997 0.008935825 0.016246954
Yes 0.006769564 0.003790956 0.004603304
ChiS2 <- chisq.test(T2, correct = FALSE)
ChiS2
Pearson's Chi-squared test
data: T2
X-squared = 195.2, df = 7, p-value < 2.2e-16
A Chi Square test of independence revealed that among individuals that are in their relationships, happiness level and physical abuse attempts frequency in the relationship were significantly associated, \(\chi ^2 = 195.1951337, df = 7, p = 1.1942317\times 10^{-38} < 0.01\).
n <- choose(ncol(T2), nrow(T2))
limit <- 0.05/n
limit
[1] 0.001785714
Now, there will be 28 post hoc tests comparing between columns of variable the physical abuse attempts frequency. The standard significant for gauging the smallness of a p-value is 0.05 but there are 28 individual test inside that big one so we have to divide it by 28. Therefore, the result of one of these test would be significant if the post hoc test has a p-value that is less than 0.0017857.
chisq.test(T2[, c(1, 2)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 2)]
X-squared = 41.204, df = 1, p-value = 1.371e-10
chisq.test(T2[, c(1, 3)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 3)]
X-squared = 43.719, df = 1, p-value = 3.792e-11
chisq.test(T2[, c(1, 4)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 4)]
X-squared = 65.003, df = 1, p-value = 7.48e-16
chisq.test(T2[, c(1, 5)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 5)]
X-squared = 53.631, df = 1, p-value = 2.419e-13
chisq.test(T2[, c(1, 6)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 6)]
X-squared = 21.156, df = 1, p-value = 4.235e-06
chisq.test(T2[, c(1, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 7)]
X-squared = 8.1759, df = 1, p-value = 0.004245
chisq.test(T2[, c(1, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(1, 8)]
X-squared = 24.574, df = 1, p-value = 7.152e-07
chisq.test(T2[, c(2, 3)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 3)]
X-squared = 0.0024107, df = 1, p-value = 0.9608
chisq.test(T2[, c(2, 4)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 4)]
X-squared = 3.8079, df = 1, p-value = 0.05101
chisq.test(T2[, c(2, 5)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 5)]
X-squared = 3.4856, df = 1, p-value = 0.06191
chisq.test(T2[, c(2, 6)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 6)]
X-squared = 2.1832, df = 1, p-value = 0.1395
chisq.test(T2[, c(2, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 7)]
X-squared = 0.52001, df = 1, p-value = 0.4708
chisq.test(T2[, c(2, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(2, 8)]
X-squared = 4.1566, df = 1, p-value = 0.04147
chisq.test(T2[, c(3, 4)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(3, 4)]
X-squared = 3.7006, df = 1, p-value = 0.05439
chisq.test(T2[, c(3, 5)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(3, 5)]
X-squared = 3.3838, df = 1, p-value = 0.06584
chisq.test(T2[, c(3, 6)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(3, 6)]
X-squared = 2.1176, df = 1, p-value = 0.1456
chisq.test(T2[, c(3, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(3, 7)]
X-squared = 0.49337, df = 1, p-value = 0.4824
chisq.test(T2[, c(3, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(3, 8)]
X-squared = 4.0781, df = 1, p-value = 0.04344
chisq.test(T2[, c(4, 5)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(4, 5)]
X-squared = 0.0020259, df = 1, p-value = 0.9641
chisq.test(T2[, c(4, 6)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(4, 6)]
X-squared = 0.075509, df = 1, p-value = 0.7835
chisq.test(T2[, c(4, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(4, 7)]
X-squared = 0.025652, df = 1, p-value = 0.8728
chisq.test(T2[, c(4, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(4, 8)]
X-squared = 0.86468, df = 1, p-value = 0.3524
chisq.test(T2[, c(5, 6)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(5, 6)]
X-squared = 0.056595, df = 1, p-value = 0.812
chisq.test(T2[, c(5, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(5, 7)]
X-squared = 0.03264, df = 1, p-value = 0.8566
chisq.test(T2[, c(5, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(5, 8)]
X-squared = 0.78022, df = 1, p-value = 0.3771
chisq.test(T2[, c(6, 7)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(6, 7)]
X-squared = 0.10134, df = 1, p-value = 0.7502
chisq.test(T2[, c(6, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(6, 8)]
X-squared = 0.31486, df = 1, p-value = 0.5747
chisq.test(T2[, c(7, 8)], correct = FALSE)
Pearson's Chi-squared test
data: T2[, c(7, 8)]
X-squared = 0.60324, df = 1, p-value = 0.4373
Post hoc comparisons of happiness level and physical abuse attempts frequency categories revealed that higher rates of unhappiness were seen among those being victimed of more physical abuse attempts frequency, up to 11 to 20 times in the last year of the relationship. After the above process, it was found that there were 7 significant results including:
-“never” and “this has not happened in the past year, but it did happen before then” (Column 0 and 1)
-“never” and “once in the last year of the relationship” (Column 0 and 2),
-“never” and “twice in the last year of the relationship” (Column 0 and 3)
-“never” and “3 to 5 times in the last year of the relationship” (Column 0 and 4)
-“never” and “6 to 10 times in the last year of the relationship” (Column 0 and 5)
-“never” and “more than 20 times in the last year of the relationship” (Column 0 and 7)
NDF$PeopleMarried <- gsub(",", "", NDF$PeopleMarried)
NDF$PeopleMarried <- as.numeric(NDF$PeopleMarried)
ggplot(data = NDF, aes(x = PeopleMarried, y = TimeSpent)) +
geom_point() +
theme_bw() +
geom_smooth(method = "lm") +
labs(x = "People that the individual subject married in the past", y = "Time Spent Together(in years)", title = "Correlation Graph")
My codebook has significantly more categorical variables than quantitative variables. The ones I found that are relavant to my research only produced 58 data points generated as a result which indicates a weak relationship. However, the scatterplot did suggest a relationship that is positive and close to a linear form.
cor.test(NDF$PeopleMarried, NDF$TimeSpent, use = "complete.obs")
Pearson's product-moment correlation
data: NDF$PeopleMarried and NDF$TimeSpent
t = 29.845, df = 4847, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3699609 0.4175227
sample estimates:
cor
0.3940055
r <- cor(NDF$PeopleMarried, NDF$TimeSpent, use = "complete.obs")
r
[1] 0.3940055
variance <- r^2
variance
[1] 0.1552404
Among individuals (my sample), the correlation between number of people that they married in the past (quantitative) and number of time spent together (in years) (quantitative) was 0.3940055 (p < 0.0001), suggesting that only 15.5240357% (i.e. 0.3940055 squared) of the variance in number of time spent together can be explained by number of the number of people that they married in the past.
As mentioned above, the scatterplot suggests a relationship that is positive. However, the value of the correlation that we find between the two variables is r= 0.3940055, which is closer to 0 than 1, which indicates a weak linear relationship between the two variables.
First, the data is placed on the search path using the alr4 package. The variables of interest are selected and stored in the data frame LRDF using the select function from the dplyr package. Then, those variables of interest including Velocity and Dist varibles. A data frame with 96 observations was used to investigate the relationship between the velocity of the baseball and the distance that the baseball travels.
library(alr4)
summary(domedata)
Cond Velocity Angle BallWt BallDia
Head:19 Min. :149.3 Min. :48.30 Min. :140.1 Min. :2.810
Tail:15 1st Qu.:154.1 1st Qu.:49.50 1st Qu.:140.1 1st Qu.:2.810
Median :155.5 Median :50.00 Median :141.0 Median :2.860
Mean :155.2 Mean :49.98 Mean :140.7 Mean :2.842
3rd Qu.:156.3 3rd Qu.:50.60 3rd Qu.:141.0 3rd Qu.:2.860
Max. :160.9 Max. :51.00 Max. :141.9 Max. :2.880
Dist
Min. :329.3
1st Qu.:347.8
Median :351.9
Mean :353.4
3rd Qu.:359.2
Max. :373.8
LRDF <- domedata %>%
select(Velocity, Dist)
cor.test(LRDF$Velocity, LRDF$Dist, use = "complete.obs")
Pearson's product-moment correlation
data: LRDF$Velocity and LRDF$Dist
t = 4.0574, df = 32, p-value = 0.000298
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.3047263 0.7693615
sample estimates:
cor
0.5828323
r2 <- cor(LRDF$Velocity, LRDF$Dist, use = "complete.obs")
The correlation 0.5828323 is pretty strong > 0.5, however, doesn’t fully characterize the linear relationship between two quantitative variables—it only measures the strength and direction (positive and linear). Therefore, we will need to summarize the linear relationship through trying to fit a line that best fits the linear pattern of the data between the Velocity and Dist variables.
library(ggplot2)
ggplot(data = LRDF, aes(x = Velocity, y = Dist)) +
geom_point(color = "purple") +
theme_bw() +
labs(x = "Velocity of the Baseball(feet/second)", y = "Distance Traveling(feet)")
The above graph is a scatter plot of all the data of two variables in the LRDF data frame including Velocity of the Baseball (feet/second) and Distance Travelling of the baseball (feet). One can see a positive correlation with a linear form of a relaitonship between these two variables from the figure and the result of the correlation test above. We will investigate the relationship these twos more in-depth.
mod.lm <- lm(Dist~Velocity, data = LRDF)
summary(mod.lm)
Call:
lm(formula = Dist ~ Velocity, data = LRDF)
Residuals:
Min 1Q Median 3Q Max
-15.2292 -6.8905 0.1209 6.7111 12.9798
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.4767 90.4218 -0.149 0.882456
Velocity 2.3638 0.5826 4.057 0.000298 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.095 on 32 degrees of freedom
Multiple R-squared: 0.3397, Adjusted R-squared: 0.3191
F-statistic: 16.46 on 1 and 32 DF, p-value: 0.000298
plot(mod.lm, which = 1)
It is evident that the residual graph does not show any pattern. I am confident to be able to summarize the linear relationship by trying to fit a line that best fits the linear pattern of the data.
MEAN <- apply(LRDF, 2, mean)
SD <- apply(LRDF, 2, sd)
MEAN
Velocity Dist
155.1882 353.3559
SD
Velocity Dist
2.418710 9.809556
coef(summary(mod.lm))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.476660 90.4218220 -0.1490421 0.8824557262
Velocity 2.363791 0.5825903 4.0573810 0.0002980013
b <- coef(summary(mod.lm))[2, 1]
The slope of the fitted line is:
\[b = r*\frac{\Delta Dist}{\Delta Velocity}\]
\[<=>b = 0.5828323 * \frac{9.809556}{2.4187105} = 2.3637909\]
This means that for every 1-unit increase of the explanatory variable, there is, on average, a 2.3637909-unit increase in the response variable. Specifically, for every feet per second that the baseball travels faster, the maximum distance by a baseball increases, on average, by 2.3637909 feet.
The intercept of the line is: \(a = 353.3558824 - 2.3637909*155.1882353 = -13.4766605\)
And therefore the least squares regression line for this case is:
\(\hat{Dist} = 2.3637909*Velocity -13.4766605\)
ggplot(data = LRDF, aes(x = Velocity, y = Dist)) +
geom_point(color = "purple") +
theme_bw() +
labs(x = "Velocity of the Baseball(feet/second", y = "Distance Traveling(feet)") +
geom_smooth(method = "lm", se = TRUE) +
labs(title = expression(hat(Y) == "2.3637909x − 13.4766605") )
The figure above is the regression line trying to fit the linear pattern of the data of the scatter plot. Evidently, it fits the linear pattern of the data quite well.
PV <- predict(mod.lm, newdata = data.frame(Velocity = 159))
PV
1
362.3661
Practically, what the figure tells us is that in order to find the predicted maximum distance for a 159(feet/second), we plug Velocity = 159 into the regression line equation, to find that:
Predicted distance = (2.3637909 * 159) - 13.4766605= 362.3660972. 362.3660972 feet is our best prediction for the maximum distance at which the velocity is 159.